Welcome to the MiamiR software package introductory tutorial, for all yourGWAS and wider genetic data visualisation needs!
Firstly, the recommended installation for this package is as follows:
Standard Method: remotes::install_github(“CRON-Project/MiamiR”)
Alternative Method: BiocManager::install(“CRON-Project/MiamiR”)
If you have any issues please raise issues through GitHub or contact me at: spet5590@ox.ac.uk with subject heading indicating issue.
Once installed, load in the MiamiR package (lib.loc may need to be specified), This package is complete with extensive documentation and functionality.
A web app version will also be available soon (TBC).
library(MiamiR)
Here’s how to create a Manhattan plot of the included exemplar ‘Intelligence_Sum_Stats’ summary statistics data set using the Single_Plot() function included in the package.
This function will return a plot object which we’ll manually save to a .jpg file in the vignettes/Plots directory of the package for future use and user access.
Initially, all automated, default settings will be used.
A progress bar will display as the function runs. This UI is the default for all MiamiR functions.
More detailed outputs can be viewed via setting the Verbose argument to TRUE for any MiamiR function.
plots_dir <- here("vignettes", "Plots")
Manhattan_Plot <- Single_Plot(Data = Intelligence_Sum_Stats)
setwd(plots_dir)
ggplot2::ggsave("Intelligence_Plot.jpg", plot = Manhattan_Plot, width = 30,
height = 15, units = "in", dpi = 100)
Let’s inspect our saved Manhattan Plot.
knitr::include_graphics(paste0(plots_dir,"/Intelligence_Plot.jpg"))
The Single_Plot() function also allows for a range of customisation. Here is a small example of such functionality.
It is also worth noting that running ?Single_Plot() or ? for any function or component in MiamiR will provide documentation on all functionality available and any relevant information on in-house data sets used.
Manhattan_Plot <- Single_Plot(Data = Intelligence_Sum_Stats, X_Axis_Title = "Region",
Title = "Custom Plot", Label_Height = 15,
Anchor_Label = "centre", Label_Angle = 90,
Chromosome_Colours = c("red", "orange"),
Colour_Of_Index = "blue", Diamond_Index = T,
Diamond_Index_Size = 10)
setwd(plots_dir)
ggplot2::ggsave("Intelligence_Plot_Custom.jpg", plot = Manhattan_Plot, width = 30,
height = 15, units = "in", dpi = 100)
Let’s inspect our customised Manhattan Plot.
knitr::include_graphics(paste0(plots_dir,"/Intelligence_Plot_Custom.jpg"))
Files can also be fed in for all MiamiR functions using variables containing a file path or a raw file path itself, in addition to multiple files or current environment objects, whilst allowing for varied customisation.
Here, raw file paths for the above summary statistics are used alongside another included data set called ‘Household_Income_Sum_Stats’, with alternative customisation demonstrated as well.
Automated and unspecified defaults are also subtly adjust for different inputs across all functions.
data_dir <- here::here("inst", "extdata", "Example_Data_Raw")
setwd(data_dir)
Sum_Stats <- c(paste0(data_dir, "/Intelligence_Sum_Stats.txt"),
paste0(data_dir, "/Household_Income_Sum_Stats.txt"))
Manhattan_Plot <- Single_Plot(Data = Sum_Stats, Label_Angle = 60, Label_Colour = "grey",
Sig_Line_Type = "solid", Condense_Scale = F )
setwd(plots_dir)
output_files <- c()
for (plot_name in names(Manhattan_Plot)) {
file_path <- paste0(plot_name, ".jpg")
ggplot2::ggsave(
filename = file_path,
plot = Manhattan_Plot[[plot_name]],
width = 30, height = 15, units = "in", dpi = 100
)
output_files <- c(output_files, file_path)
}
Let’s inspect our saved, customised Manhattan Plots.
imgs <- file.path(plots_dir, output_files)
carousel(imgs)
There is also a Regional_Plot() function built around the above functionality, which has unique features in addition to seamlessly inheriting the Single_Plot() arguments to produce annotated regional plots.
Here is how to run the function, which automatically ascertains regions of interest, with this example focusing on key regions in chromosomes 10 and 22 of the intelligence GWAS summary statistics.
We will also save all generated objects as correctly sized images with guide measures also returned by the function and supplied as part of the plot objects.
Although defaults are suitable it is very important to specify the correct genome build (in this case HG19) for accurate recombination and gene annotation tracks/overlays, in addition to queried LD, where needed.
Regional_Plots <- Regional_Plot(Data = Intelligence_Sum_Stats,
Genome_Build = "grch37",
Chromosomes = c(10,22))
for (plot_name in names(Regional_Plots)) {
plot <- Regional_Plots[[plot_name]]
file_safe <- gsub("[^A-Za-z0-9]", "_", plot_name)
filename <- paste0(file_safe, ".jpg")
height_to_use <- attr(plot, "dynamic_height")
if (is.null(height_to_use)) height_to_use <- 30
setwd(plots_dir)
ggplot2::ggsave(
filename = filename,
plot = plot,
width = 30,
height = height_to_use,
units = "in",
dpi = 100,
limitsize = FALSE
)
}
Let’s inspect our Regional Plots for the aforementioned chromosomes.
img_dir <- plots_dir
img_paths <- list.files(
path = img_dir,
pattern = "Regional_Plot\\.jpg$",
recursive = FALSE,
full.names = TRUE
)
img_paths <- img_paths[order(file.info(img_paths)$mtime)]
carousel(img_paths)
One key customisation of the Regional_Plot() function is the ability to query for LD values around the lead SNP (with settings available for build, population etc.) and colour points in the top panel accordingly.
For simplicity and speed let’s focus on the single chromosome 22 peak.
Please bear in mind that querying an API is slow and may not always contain data pertaining to your particular set of SNPs or may be unavailable within certain system paramaters.
It is often better to produce your own LD matrix using tools such as PLINK1.9/2, the results of which can also be fed in to this package (FUTURE/TBC).
Regional_Plots <- Regional_Plot(Data = Sum_Stats[1], Auto_LD = TRUE, Chromosomes = c(22),
Genome_Build = "grch37", Gene_Biotype = c("coding", "snorna"),
Gene_Biotype_Colour = c("red", "green" ),
Sense_Arrow_Colour = "darkorange",
Gene_Structure_Size_Exons = 20,
Gene_Legend_Location = "Top Left",
Gene_Legend_Title = "Categories",
Y_Axis_Title = "LOG10Pvalue", Diamond_Index_Size = 20)
for (plot_name in names(Regional_Plots)) {
plot <- Regional_Plots[[plot_name]]
file_safe <- gsub("[^A-Za-z0-9]", "_", plot_name)
filename <- paste0(file_safe, "_LD", ".jpg")
height_to_use <- attr(plot, "dynamic_height")
if (is.null(height_to_use)) height_to_use <- 30
setwd(plots_dir)
ggplot2::ggsave(
filename = filename,
plot = plot,
width = 30,
height = height_to_use,
units = "in",
dpi = 100,
limitsize = FALSE
)
}
Let’s have a look at our heavily customised and LD annotated Regional Plot.
img_dir <- plots_dir
img_paths <- list.files(
path = img_dir,
pattern = "Regional_Plot_LD\\.jpg$",
recursive = FALSE,
full.names = TRUE
)
img_paths <- img_paths[order(file.info(img_paths)$mtime)]
knitr::include_graphics(img_paths)
Let’s now pass both sets of summary statistics we used in the earlier example to show how regional plots can be constructed on distinct data sets simultaneously.
Both Intelligence_Sum_Stats and Household_Income_Sum_Stats seem to have a single significant association on chromosome 9.
Let’s see if they are in similar genomic locations…
For single vector arguments like Point_Colour, a list can be allocated in the order of customisation of data required.
Point_Colour = c(“blue”, “red”) will allocate blue to the points in Intelligence_Sum_Stats and red to the points in Household_Income_Sum_Stats. Any singularly specified arguments will apply globally.
Arguments which require a nested list structure, like Gene_Biotype can have two separate listed arguments passed by allotting e.g. Gene_Biotype = list( c(), c(“coding”, “processed_pseudogene” )).
This will allocate NULL/default settings to Intelligence_Sum_Stats and stratify the Household_Income_Sum_Stats plot by only “coding” and “processed_pseudogene” regions.
Please note that HG19 is passed as the genome build for Intelligence_Sum_Stats but HG38 for Household_Income_Sum_Stats due to differences in the data sets’ references.
Regional_Plots <- Regional_Plot(Data = c(Intelligence_Sum_Stats, Household_Income_Sum_Stats),
Chromosomes = c(9), Genome_Build = c("grch37", "grch38"),
Recombination_Line_Colour = c("darkgrey", "pink"),
Point_Colour = c("blue", "red"), Y_Axis_Title = c("SigVal"),
Gene_Biotype = list( c(), c("coding", "processed_pseudogene" )),
Gene_Biotype_Colour = list( c(), c("blue", "green")) )
for (plot_name in names(Regional_Plots)) {
plot <- Regional_Plots[[plot_name]]
file_safe <- gsub("[^A-Za-z0-9]", "_", plot_name)
filename <- paste0(file_safe, "_LmultiD", ".jpg")
height_to_use <- attr(plot, "dynamic_height")
if (is.null(height_to_use)) height_to_use <- 30
setwd(plots_dir)
ggplot2::ggsave(
filename = filename,
plot = plot,
width = 30,
height = height_to_use,
units = "in",
dpi = 100,
limitsize = FALSE
)
}
Let’s have a look at both chromosome 9 Regional Plots for our exemplar data sets.
img_dir <- plots_dir
img_paths <- list.files(
path = img_dir,
pattern = "Regional_Plot_LmultiD\\.jpg$",
recursive = FALSE,
full.names = TRUE
)
img_paths <- img_paths[order(file.info(img_paths)$mtime)]
carousel(img_paths)